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(54) Abstract Title 

System for synchronous display of text and audio data 



(57) A system for synchronously displaying text and audio data comprises display and audio output devices 
and a user controlled input device. In use the system will synchronise a speech output from the audio device 
with a word by word text display on the display device. The user can control the speed, style and position 
within the text using the input device. Preferably the input device comprises either a small keypad similar to 
those on a mobile phone or on screen buttons which are activated by a mouse and which allow a user to 
navigate through a hierarchical menu. Typically the display may be a small LCD panel on which the size of the 
words displayed can fill the screen. In other embodiments of the system the output means can display the text 
as braille or as symbols or signs as used in sign language. 
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At least one drawing originally filed was informal and the print reproduced here is taken from a later filed formal copy. 
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This print takes account of replacement documents submitted after the date of filing to enable the application to comply 
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SPECIFICATION OF SERIAL PRESENTATION SYSTEM 
1* Background and inventive aspects 

1.1 Mobile devices 

There is a real problem with current mobile information devices, such as mobile phones, 
palmtops, and PDAs (personal digital assistants), concerning the ability to present information 
in a universally accessible way, and allow the user to browse through information quickly and 
efficiently. The problems arise from the small screen and limited physical space for buttons. 

The WAP approach has information marked up on a web site and presented in small chunks, 
and the user has to step from one chunk to another. In the current invention, the information is 
analysed at the time of access for syntactic units and for structural and style elements, typically 
marked up in HTML. The larger syntactic units, such as paragraphs and sentences, may be 
divided into smaller units and ultimately into words and individual characters. Words or 
groups of words are displayed serially, where the time between successive displays is 
dependent either on synchronisation with the speaking of the words using a speech synthesiser 
or on an approximation to the time it would take if they were spoken. The speed is set by the 
user so that a whole document can be read without user intervention, as compared to WAP 
where the user has to step from one chunk to another. But the user can also step through units 
of different sizes, to scan the document quickly, and get from one point to another in a way 
which is based on the meaning of the document, rather than how it would appear on the page. 
For example they can step sentences rather than lines. 

With the word-by-word display, or display in a small group of words, it is possible to have 
relatively large characters on the screen, thus improving legibility, speed of reading, and 
accessibility to people with visual impairment. By allowing synchronised speech, accessibility 
is given to people who may be blind, dyslexic, illiterate, or otherwise "print disabled". This 
increases the market for the mobile device. 

The display of the invention has a main field for reading the document, and subsidiary fields 
in which structural elements of a document, such as headings, and presentation values, such as 
speed, can be shown. However these elements and values can be read out in the main display. 
This caters for the situation where the display has little or no space for subsidiary fields. 

Another aspect of the current invention concerns the control of such a device, and allows 
control of serial presentation and stepping all using a few keys or buttons. A state machine 
can be used in the implementation, and the interface can be dynamically changed (e.g. 
simplified to fewer levels for a child) by changing the definition of the states and transitions to 
which the state machine is working. 

1.2 Reading and writing 

Another area of application of the invention is as a reading and writing aid. In this case there 
may be no limitation on screen size or space for buttons, and a conventional computer monitor 
and keyboard may be used. The invention allows print disabled people to read. Through 
combination of the invention with an editor, it allows such people to write as well as to read. 

Writing using just four buttons is possible using the means described below, see 3.10. 
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1.3 Learning to read, speak and write - or to communicate in other ways 

The invention can be used as a language learning aid. For example where the language of the 
synthesiser is not the first language of the user, the user is able see the words and hear them at 
the same time, and associate sight with sound. The user can write words and immediately hear 
them spoken. Thus the invention can be used in the teaching of, say, English as a foreign 
language. It can be used in the teaching literacy skills in children with special needs. An 
embodiment below describes the system in terms of adult and child usage in this context. The 
simplicity of operation by the child using the cursor keys is important in this context. 

The invention has other optional output modalities. There is a tactile modality - which may be 
dynamic refreshible Braille, or it may be a code for serial tactile output of characters or 
phonemes across fingers and thumb. There is a symbol modality, when words are associated 
with symbols, according to the PCS or Rebus symbol sets for example. And there is a sign 
language modality, when words are associated with signs, according to British Sign Language 
(BSL) for example. 

The invention allows output of a word in different modalities to be synchronised. As a word is 
displayed or spoken, the corresponding symbol (if there is one) is displayed in a separate 
window. There is a database of symbols associated with words in a wordlist. In the case 
where there are several symbols for a single word, such as "present", the choices can be 
displayed together. However the disambiguation can be recognised in mark-up, so that the 
appropriate choice of symbol is displayed. 

A similar arrangement is possible with sign language, where there is a database of signs 
corresponding to words. As a word of text is displayed or spoken, the corresponding sign is 
displayed in parallel and synchrony. 

1.4 Web application 

In a certain embodiment, the speech synthesiser is mounted on the web server, but the 
presentation is still controlled from the user's client. Speech is sent from the server as an 
encoded file, such as a ".wav" file, to the client. However this file, or a parallel file, has extra 
coding to mark word boundaries, allowing synchronisation of serial visual display with the 
speech. Typically the server sends a sentence in advance of the sentence that the user is 
reading. The files are stored in the client, for reuse, in case the user wants to step backwards a 
sentence or more. The user can step forwards and backwards a word at a time, and the 
relevant word is extracted from the file, using the word boundary markings. 

In another embodiment, the speech synthesiser is split into a front end and a back end, with the 
text converted to a phonetic notation by die front end, and the notation passed from the server 
to the client machine and then converted to speech sounds on the user's client. The former 
conversion may be done at an earlier stage, so that a web page contains both a text version and 
a corresponding static phonetic version. The latter conversion may be performed by an applet 
previously downloaded from the web site, or by a program allowing the conversion of a 
stream of standard phonetic notation from any web site. 

Such a notation might be used in a pronunciation dictionary. It would typically contain 
allophones and syllable stress markings, but also word stress and intonation, in order to give a 
complete inflection for each word, and thus for the sentence as a whole. 
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The notation can allow a reverse translation, back to words of text, so that the software (e.g. 
applet) on the client can display each word as its synthesised sound is output - i.e. the word is 
displayed at the same time as it is spoken. 

The notation can also disambiguate words that have several meanings. For example it can use 
mark-up similar to that proposed for the semantic web. This disambiguation is useful if there 
is a choice of symbol according to the meaning. 

This system allows a web site to be made accessible to a visitor without the need for assistive 
technology. The functionality on the user client can be implemented as an applet, which can 
be downloaded from the site. If spoken sentences are downloaded as compressed audio in 
advance of reading, it only requires a bandwidth sufficient for this audio stream, and the 
system is able to keep up with a reader reading through the page. The synthesiser can be 
implemented as a servlet, for portability across different server platforms. 

Note that the storing of a sentence as a file with word boundaries has advantages in this and 
other embodiments, because the speech synthesiser can work out inflection and pronunciation 
for each word appropriate for the sentence as a whole. Thus the user can stop, start and step 
through the sentence while this inflection and pronunciation is maintained for each word. 
Normally if you give a synthesiser a part of a sentence or a single word, it cannot reproduce 
the inflection and pronunciation it would have given in the context of a complete sentence. 

The same approach can be used with pre-recorded speech, such as produced by an actor for a 
talking book. This speech is divided typically into sentences, and each sentences is marked up 
with word boundaries so that the display can subsequently be synchronised with the speech. 
The mark up may be virtual, e.g. by a time indication of when each word ends with respect to 
the beginning of the sentence. 

1.5 Maintaining context 

It is important, that the reader is aware of current context. For this reason it is useful to be 
able to read the title of the current document, the proportion of the document that has been 
read, the heading of the current section, and the closest previous link. These values are 
maintained while the position of presentation (i.e. where you are reading in the document) 
changes. For example, when searching through a document for instances of a particular word, 
as each instance is examined, the context is provided. 

In another embodiment, the reader is presented with just two levels in the tree: the current level 
on which the operations apply, and the level above, which gives the user the context. For 
example when setting the year, the word "year" might be seen above its value, say "2001", 
see figure S. 

1.6 Consistent presentation 

With this invention it is possible to treat all information in the same way, i.e. with the same, 
serial presentation, and the same means of control. Structural information and sets of 
presentation parameters (or style sheets) can be treated as meta-documents associated with the 
"principle" document, i.e. the document being read. The meta-documents are in effect sub- 
trees of a main tree, for example as shown in Figure 5. 
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2. Description of invention 

The system comprises one or more serial computer output means, and one or more computer 
input means by which the presentation of the serial output is controlled and by which the 
content of that output is selected; where the content can be a document, and the presentation 
can be controlled by the input with respect to style, speed, and position within the document; 
and the selection of document for presentation involves navigation of documents associated 
through a directory structure and/or hypertext linkage. 

The serial output means typically comprises a word-by-word serial visual presentation 
synchronised with the speaking of the said words by a speech synthesiser. The control input 
means typically comprises a small number of buttons or keys. 

This combination of serial visual presentation, synthesised speech and operation with a small 
number of buttons or keys allows use of the system as a user interface for small hand-held or 
body-warn devices providing access to infonnation services and/or reading material, where the 
physical size of the device necessitates a small display and a few buttons. 

This combination of serial visual presentation and synthesised speech allows use of the system 
as a user interface to electronic text for people with a visual impairment, a visual processing 
impairment (such as dyslexia), a general learning difficulty, illiteracy, or any other difficulty in 
reading the written word in the language of the system (for example if this language is their 
second language). The system can also be used for language learning. 

The operation with a small number of operations allows the system to be used by a person 
with limited manual dexterity. It allows for a simple variant with an alternative input based on 
speech recognition of a few commands corresponding to each of these operations. 

An embodiment of the invention has four basic operations and a further operation to change 
mode, so can be operated using five keys, buttons or voice commands. Typically the four 
basic operations would be activated by the four arrow or "cursor" keys: Up, Down, Left and 
Right. The fifth key might be the tab key. 

The serial visual output of the system can be accompanied by visual display of significant text 
from, or associated with, the document being read via the serial output. Such text can include 
headings in the document, the link text from hypertext links embodied in the document, and 
the title of the document. Each piece of significant text can have one or more associated 
values; for example the link text has an associated URL but could also have binary value 
indicating whether the target of the link has been visited or not. 

The serial visual output of the system can be accompanied by visual indication of the values of 
parameters associated with the serial presentation, both visual and audio. This visual 
indication has a textual form, such as a percentage number, but may also have an analogue 
form, such as a progress bar. 

The abovernentioned pieces of significant text with their associated values, and the above 
parameters with their associated values, are stored as lists, which can be examined by the user 
through list operations. The system allows values can be changed, and items to be added or 
deleted from the lists, if it appropriate to do so. 
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One aspect of the invention is that parameters, lists and embedded objects can all be examined 
and manipulated using the same means of navigation and control. Thus a single paradigm can 
cover the whole system; this makes the system easier to use and to learn to use. 

Normally, at any given moment, there is one document which is the principle text, and this is 
presented through the "main field". Information about mis document can be extracted and 
inserted into lists. One of the lists can be a list of hypertext links within the document, another 
can be a list of headings within the document. Such lists can be considered as "meta- 
documents", as they are documents about documents. In the case of the headings list, it 
describes the structure of the principle document. 

The presentation parameters and their values can also be grouped together into lists, which can 
be considered as style sheets - another kind of meta-document. This kind of meta-document 
can also be used for changing the values of parameters, and thus change the characteristics of 
the presentation of the principle document. 

The document may contain embedded objects, such as tables, which have a structure in then- 
own right. When the user reads through the document and reaches an embedded object, the 
system provides a view of the embedded object as a self-contained document nested inside the 
"outer" document. 

The principle document and its associated meta-documents and nested documents can each be 
navigated and controlled as a document The system provides the means for changing of focus 
(for user navigation and control) between the principle document and the other documents. 

Each of these associated document can be serially presented (e.g. through a word-by-word 
display or speech) through the same output means. Thus for visual presentation, me same 
field can be used for the display; so effectively this field is time-shared between the display of 
the words from the principle documents and from the other documents, depending on the 
current focus for navigation and control. The style of presentation may be changed while on 
another document, so mat the user is aware that they are not dealing with the principle 
document. For example a different voice or voice pitch can be used for speech, and a 
different character size for text display. 

However the system also allows the user to continue serial presentation of the principle 
document while altering the focus to one of the other documents. This allows one of these 
other documents to be navigated and changed, which may have an instantaneous effect on the 
position or style of presentation within the principle document For example if the current 
heading is changed in the heading list, the position of presentation changes so that presentation 
continues at the point of that heading. As another example, if the colour is changed for a 
presentation parameter, men the visual presentation of the principle document continues in that 
colour. The navigation and control of the "other" document is performed in parallel with the 
presentation of the principle document. 

An embodiment of the system can handle multi-way communication by Internet Relay Chat 
The log of the conversation or "conference" is treated as the principle document which is 
appended as text arrives from parties to the discussion. The input by the user is treated as an 
associated document which is cleared as the input is sent. The focus can be moved between 
these two documents. The user has the options (a) of having the presentation of the input as 
die user types it interrupting any presentation of the text arriving (from other parties to the 
discussion) or conversely (b) of having the presentation of the principle document interrupting 
the input as new text arrives or (c) of having the presentation move with the focus under user 
control. 
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An embodiment of the system can handle the presentation of synchronised multimedia, where 
the synchronisation and relative timing of parallel output streams may be defined by a mark of 
the material to be presented according to a standard such as SMIL. Such streams typically 
include text and accompanying audio, images and video. The system allows the user to 
change position in any one stream, and adjust the position in the other streams according to 
defined rules of synchronisation and relative timing. For example, if there are pictures 
associated with chunks of text, and the user changes position of reading of text from one 
chunk to another, then the picture will change accordingly; while conversely if the user 
changes one picture for the next, the position of the reading of the text is moved to the 
beginning of the next chunk. 

An embodiment of the Systran includes means to synchronise audio output of words of 
recorded speech with visual output of words in a serial display of the text which was spoken. 
This allows the word-by-word display and control to be used in multimedia presentation, 
where one of the streams is recorded speech, e.g. for a talking book. The user can navigate 
the text, and control the style of presentation, while maintaining the synchronisation at the 
word level The means of synchronisation can include: a speech recognition engine which 
converts speech to text; a correlator which compares and aligns this text with the text being 
spoken; a marking process whereby the digitised speech recording is marked with word- 
beginningsiand/ or word-ending codes; and search engine that looks for these marks when 
speech output is to accompany visual output of the words. 

The system can be extended to deal with tactile output in the form of dynamic Braille multi- 
character display, Braille single cell display or other tactile display. The tactile output can be 
synchronised with speech, the speed of speech being adjusted to allow the user to read the 
tactile display, e.g. by lengthening gaps between words. As an aid to learning the tactile code, 
the speech may be delayed with respect to the tactile display. 

In general the system can deal with the serial presentation of any set of data streams (or 
sequences) where the data stream can be divided into units of different sizes. For example a 
table can be divided into rows, and rows into individual cells. 
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3. Some implementation details 

3.1 FTVE KEY OPERATION 

For use on a keyboard or keypad with four cursor keys, and at least one other control key, 
there is a means of operating the system with five keys. Four of the keys are used for Left, 
Right, Up and Down operations, these operations changing the "state" of reading of a 
sequence of data units (typically textual), considered as a "document". A fifth key is used for 
changing from one document to another, while leaving the first document in its current state, 
i.e. without disturbing the reading process on that document. The first document may be the 
"principle" document and the second document may be a meta-document, i.e. a document 
about the "principle" document, e.g. controlling the presentation (speech and/or display) of the 
principle document. 

The state of reading of the sequence of data units can be shown by a diagram, see figure L 
This state-chart shows how, with four operations, you can control the reading of the sequence 
of units, allowing you to: 

•pause at any point; 

•continue reading from that point; ,j 

•step back to the beginning of a unit; 

•repeat the reading of a unit, 

•step back to the previous unit, 

•step forward onto die next unit; 

•change the size of the unit above. 

The unit size is considered as a level, and, in this example embodiment, there is a document 
level as the top level, with sentence, word and spelling levels underneath. At the spelling 
level, the unit is a single character. The Up and Down operations change the level (i.e. the unit 
size) up and down respectively. 

The Down operations are accompanied by the output of a "system" message, typically in a 
different voice for speech output to distinguish it from the voice used in reading the document 
itself. Such system messages provides feedback to the user as to the state of the system and 
the level. 

Right is used for moving forward through text, and Left for moving backward. These 
operations would be interchanged for use in a language like Arabic, where printed text is read 
from right to left. 

Other interchange of operations may be appropriate for other situations, e.g. Up and Down 
instead of Left and Right for moving backwards and forwards through Chinese text. 

3.2 ALTERNATIVE FTVE KEY OPERATION 

An alternative state chart is shown in figure 2, which provides these same capabilities. In this 
implementation there is feedback for all level change operations. 

Left is used for pausing and then stepping left. Right is used for playing a unit and then 
stepping right After playing to the end of a unit, a Left takes you into the pause state with a 
message to say that you have reached the end of the unit, and a Right starts you reading the 
next unit (unless you are at the end of the document, in which case you arc told this). In 
general, after a Left, the Right will continue the reading from the point reached; the following 



Right will start reading the next unit. In general, after a Right, the Left will pause the reading, 
and the following Left will step you back to the first word of the unit; but if you are on the 
first word it will step you to the first word of the previous unit. 

Up and Down are used for going up and down a level. Note that in this alternative, the Left 
and Right have no effect on the level. The fifth key can have the same effect as before - to 
change you from one document (or meta-document) to another. 

3.3 THREE KEY OPERATION 

Note that by having the level changes in a cycle, one can replace the Up and Down operations 
with a single operation. This allows operation with three keys plus a fourth to change 
document. 

A further reduction to three keys is possible. The Left and Right operations are as in Figure 2, 
while you are on a level. But after the third key is operated, the Left and Right are used to 
step left and right in a list of options, some of which may be to change level, others to change 
to an associated document, others to take a link (e.g. if the pause position in the document is 
over link text) which may be to another document, and others to perform operations 
appropriate to the level and/or to the type of document or meta-document that you were 
reading. One of the options may take you to a further list of options. There will normally be 
an option to take you back to where you were in the document, with a null action. The third 
key is operated a second time to take the action associated with the option selected using the 
Left and Right operations. 

These types of operation might be most suitable in an interface for a very small device, such as 
a wearable computer worn on the wrist like a watch, see below. 

3.4 WRIST- WARN EMBODIMENT 

An embodiment is warn on the wrist like a wrist-watch. The buttons are placed such that they 
can be reached and pressed by fingers of the other hand, reaching around the wrist, see figure 
4. This enables the user to read documents and control their presentation while viewing the 
text at the same time. This embodiment can use one of the operation means described above, 
with three to five buttons/keys. 

3.5 SYNCHRONISATION 

Synchronisation of the serial visual display with speech depends on how and when the speech 
is generated. 

In the case of synchronisation with synthesised speech, before (or after) each word is output 
from the speech synthesiser, an interrupt message is sent to the display processing software to 
change the display to show this (or the next) word, or, for multi-word display, to move onto 
the next group of words if appropriate. In the case of pre-synthesised speech, or in the case of 
natural speech which has been recorded and marked with word boundary markers, the display 
is moved on to the next word as a word boundary is reached in playing the encoded speech 
image (typically a sentence). The word boundaries may be indicated by codes embedded in 
the encoded speech file, or by codes in a parallel file. Such a parallel file could contain timing 
information about when each word finishes in relation to the start of sentence. 
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3.6 CHANGING PARAMETER VALUES 

The meta-document allows user input to change parameters. An HTML form with radio- 
buttons and select options is treated is a similar manner. 

Each parameter has a list of values. When a parameter is interrogated, the current value is 
simultaneously displayed and read out. A value can be changed by viewing the list of values 
and selecting a different value. In the five key implementation, the Left and Right operations 
may be used to view the list and the Down operation may be used to select a value. 

3.7 LISTS AND LISTS OF LISTS 

The meta-document is a structure containing lists. Such lists may contain names and 
addresses, words and their pronunciations, etc. Lists of lists can be used to implement tables. 
A list can be considered as a single unit at one level, items or rows at the next level, words at 
the next level, and characters at the bottom level. This allows the list to be examined using 
the five key operation above. 

3 .8 HIERARCHY - TREE - PRINCIPLES OF OPERATION 

All the information in the system can be kept in a single hierarchy of documents, i.e. a single 
directory tree, which can be navigated as a single structure, e.g. by five key operation. This is 
a suitable embodiment for the watch, where the fifth key might be a help key or emergency 
button. 

Parameters and their values can also be arranged within a tree structure, see figure 5. 
It works on the following principles: 

• All functions are on a tree, whose top is called "Menu". 

• The tree is navigated using 4 cursor keys: Up, Down, Left, Right 

• The tree is a list of lists of lists, etc. 

• List items are arranged left to right; the tree has the root at the top, and leaves at the 
bottom. 

• You can perform an operation or change a value by going Down' from a leaf identifying 
that operation or value. You then automatically return to reading the current document. 

• If you don't want to perform an operation or change a value, you can exit by going back 
Up* to the top of the tree and then, with a further TJp', returning to reading the cunrent 
document. 

• Some lists are circular (e.g. the days of the week, seen when "setting time") 

• For non-circular lists of quantitative values, 'Right 1 increases the value, and 'Left' decreases 
the value. 

For playing the current document, there is a simpler state machine than shown in Figure 1 or 2, 
since the stepping size is chosen from the tree, see Figure 5. 

• The "main field" has a word-at-a-time display of the current document 

• At any moment, the UI "focus" is either on the main field or on the tree. 

• A 'Down 1 moves the focus from the main field to the top of the tree. 

• The text may continue playing in the main field, while the focus is moved to the tree, and 
an operation performed or a value changed. 
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• Either when you select a value by going Down from a leaf, or when you leave the tree by 
going Up from the top of the tree, the focus is returned to the main field. 

• For reading the text in the main field there is a normal mode (corresponding to the 
document level), and a stepping mode (for the other levels). 

• The stepping size is a value on the tree, which can be set. 

Nonnal mode: 

• While the "focus" is on the main field, and in nonnal reading mode (not stepping), 'Right' 
plays the text, then a 'Left' pauses. 

• When the end of text is reached, you are paused on the last word; a further 'Right 1 gives an 
"End of document" message. 

• When paused, a 'Left 1 takes you back a word. 

• If you do a Left' when you are on the first word, you get a "Start of document" message. 
Stepping mode: 

• In stepping mode, after the focus is returned to the main field, a 'Right' continues playing 
the unit, and a 'Left' takes you to the start of the current unit. 

• A 'Right' after another 'Right' takes you to the next unit, unless you in the last unit, when 
you get "End of document". 

• A Left' after another 'Left' takes you to the start of the previous unit, unless you are in the 
first unit, when you get "Start of document". 

• A Hight' after a Left 1 plays the current unit. 

• A Left' after a Ttight* takes you to the start of the current unit. 



3.9 THE TREE ITSELF - See Figure 5 

At die top is "Menu". Below this are the top level functions that are provided by the system: 
Time, Mode, Navigation, Controls, and so on. Here is a simplified example of the tree, 
fleshing out Time' in more detail than the rest Note that there can be any number of alarms, 
and these are numbered 1, 2, etc. Under year' will be its value '200 1\ which cannot be 
changed. However under each alarm is a time and a period which can be 
changed. (The mechanism is not shown in the tree.) 
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3.10 EDITING AND CHARACTER INPUT 



An editing function is required for web navigation in order that the user can input URLs. It is 
also required for HTML forms where there are text entry fields. 

The five key operation for reading can be used at the same time as editing, since the five keys 
(four cursor keys plus tab) are independent of typed input of print characters, carriage- 
retuin/line-feed, and typical edit operations (such as control-C for copy). For the blind user, 
there can be immediate speech feedback as characters or words are typed, and then the passage 
can be read using the cursor keys. The fifth key (say tab) can take you out of edit mode, and 
back into normal reading of the principle document 

A special method of input of text is possible using four keys and the tree structure* For the 
first letter of a word, you go Down. Then you are presented with a circular list of groups, such 
as ABCD, EFGH, DKLMN, OPQRST, UVWXYZ; these are presented as a branch of the tree, 
which you can traverse using Left and Right. You select one group with a Down, and then 
you are presented with the individual letters of that group to choose between, and you select 
one with a second Down. You are then presented with the circular list of groups again, and 
you can go on to select the second letter, third letter, and so on- Thus there is a two-stage 
selection for each character. When you finish the word you do an Up, and you can hear the 
word spoken to give you feedback Then you can do a Down to start the next word, or do a 
Left or Right for other possibilities, such as selection of punctuation mark, or change to (or 
from) capitalisation. 

A similar method of input of numbers is possible, but only one stage is needed. Again the 
numbers are presented as a circular list On Down, you are presented with *0\ and then a 
single Right will give you ' 1 ' , two Rights give you *2\ 

3.11 LEARNING 

There a various modes of use the invention, in which you can have output modalities: 

• output modalities are in parallel and synchronised, e.g. speech with word display; 

• using a modality as a check or prompt, e.g. speech for the reader to check a word; for 
example the user can click on a particular button (e.g. fifth button) to hear the word, and 
this button is recorded in a word list; 

• using one modality to follow another, e.g. so the user sees the word before hearing it; 

• with the user inputting in one modality, before, during or after the system has output a 
word in the same or another modality, e.g. so the user sees die word and tries speaking it. 

The system can record the user's input, either as separate words or as phrases or sentences, and 
then synchronise with the corresponding output of the system in a different modality. For 
example a user can type words as they are spoken, and then play back with the speech and text 
displayed in synchrony. Or the user can speak words as they are displayed, and then play back 
the text with their own speech synchronised to it If the recording is done a word at a time, the 
synchronisation can be precise. An instructor can listen and watch the playback, to spot errors. 

The system can thus be used as a language laboratory. For example the user can read silently, 
then read with the synthesised or instructor's voice, then read a word and record their own 
speech, then hear their speech back, synchronised with the text. The system records each word 
spoken along with the text of the word, so it can be played back at a later moment. There can 
be a record of the instructor's spoken word as well as the student, so they can be played 
together, or alternating. 
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4. Mixed paradigm embodiment 

In one embodiment of the invention, aimed at universal accessibility, there are three means for 
you (the user) to operate the system: there is the "five key" operation, there is full keyboard 
operation, and there is "point and click" operation, typically using a mouse. 

There is a "main field" which is used for word-by-word display of the document you are 
reading, and also for editing names and values when in "editing mode". 

4.1 SPECIAL NEEDS SUBSET 

The embodiment includes a subset of functionality designed for a person (assumed to be a 
child for puiposes of description below) with special needs - specifically for assistance in 
reading. It provides five key and mouse operation to give access to a subset of the full system 
functionality. The full functionality is available, via keyboard and mouse operation, to a 
supervisor (assumed to be an adult for purposes of description below). 

4.2 LIST OPERATIONS 

In this embodiment, each list item is a duple (name, value), with "=" as separator. For 
example the bookmark list item is a name and its associated URL address. 

Each list can be scanned either by cursor keys, or by "short cut" operation from the keyboard, 
or by clicking on buttons or fields associated with the list. There are a number of operations 
for acting on a list, e.g. a list of bookmarks: 

Enter (Enter), Cancel (Esc), Copy (control-c), Add (control-a), and Remove (control-d). 

The Copy operation copies the current item to the main field where it can be edited if 
necessary. The item in the edit field can be added or deleted from certain lists, using Add or 
Remove. Such lists have items arranged in alphabetic order. When editing is finished, the 
Escape operation takes you back to reading the document. If the item in the edit field has a 
URL part to it, you can follow this link using the Enter operation, whereupon a new page or 
document is fetched and editing mode is left. 

4.3 CHILD CONTROLS 

The facilities for the child will all keyboard or mouse operation of very limited facilities: 

• play through a document; 

• play, step forward and step backwards a paragraph, sentence or word; 

• step through the characters of a word to spell it. 

The document is divided into units: the document itself, its sections, paragraphs, sentences, 
words and characters. The child can change the unit size or "lever, so that the system plays, 
or steps, through a document, paragraph, sentence, word or character. 
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4.4 CHILD SCREEN 

There is a simple window on the screen for the child, showing: 

• the title of the document (such as a book); 

• a percentage showing how far they are through the document; 

• the current section heading; 

• the current word; 

• the set of four arrow keys arranged as inverted T; 

• an indication of the level (document, paragraph, sentence, word or spelling); 

• an indication of die state (paused, playing or stopped at the end of unit). 

The current word is shown in die '"main field'*. 

4.5 ADULT CONTROL 

The adult can use either mouse or keyboard to access full functionality. 

The adult can set up a booklist and the particular book to be read by the child. The adult can 
navigate hypertext, select links, edit them, etc. The adult can control the display and speech 
parameters, for example the size of text in the main field, and the volume of the speech. 

To select a field, the user can click on the field itself and die system will read the full value in 
the field. Hie user can then click on up and down buttons to change the value. In the case of 
navigation fields, the user can press Enter to select die address to go to, or Copy to put it in the 
main field for editing. 

For control of display and speech parameters using the mouse, the user has a button which 
brings up a dialogue box giving each parameter name and its value, for example: 

• "Magnify" and approximate character size (in points) of text in the main field; 

• "Text" and its colour; 

• "Paper" (background) and its colour; 

• "Volume" and its value (on scale 0-9); 

• "Modality" and value (see below); 

• "Speed" and its approximate value in words per minute. 

The value is applied when you click on the "OK" button in the box. 

Correspondingly there are "short cut" operations using upper case, lower case and control 
characters: 

• M increases text size, m decreases it, and control-m queries it; 

• T and t take you up and down list of colours for text, control-t queries the colour; 

• P and p take you up and down list of colours for paper, control-p queries the 
colour; 

• V increases the volume, v decreases the volume, control-v queries the volume; 

• + increases the degree of speech, = decreases it, and control-- queries it; 

• S increases the speed, s decreases it, and control-s queries it. 

The effect of the short-cut operations is immediate, i.e. there is no need to press an OK. 
When selecting colours by short-cut, you cannot select text colour the same as the current 
paper colour or vice versa. 
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4.6 MODALITIES 

The system has several modalities of operation, with different degrees of speech, including a 
modality where there is no speech. When there is no speech, words are displayed for a time 
which is a function of the speed setting, the type of character and number of each type, with a 
longer time for punctuation. Otherwise the display is synchronised with the synthesised 
speech. 

There is also a modality with no text visible in the main field as a word is spoken. The user 
can type in the word, and have it checked for spelling against the word that was spoken. 

There are two independent speed parameters: one for speech when the display is synchronised 
to the speech, and one for the display is not thus synchronised. 

When using the short-cut operations, the speed refers to the display speed when there is no 
speech, otherwise it refers to the speech speed to which the display is synchronised. 

4.7 EMBEDDED STRUCTURED OB JECTS 

As stated above, a document is divided into units of different sizes. However the document 
may contain a structured object, such as a table, where the units inside the object have a 
different set of values, e.g. 

• table; 

• row; 

• cell; 

• word; 

• formula; 

• characters. 

There is a default value depending on context, e.g. cell for tables. 

4.8 CHANGING FOCUS USING KEYBOARD 

In this embodiment, the focus is moved from the main field to another field by typing 
control+letter where the letter is associated with that field. For return to the main field you can 
type Escape. However when selecting an address to go to, the Enter will return you to the 
main field for the page or file that you have fetched. 

You can use tab and shift-tab to more focus between fields. For numeric pad operation you 
can use the 1 and 7 keys for "tab" and "shift-tab". 

4.9 CHANGING FOCUS USING MOUSE 

You can change focus by clicking on a field. You can thus change focus back to the main 
field by clicking on the main field. You can also change focus back to the main field by 
clicking on the Escape button. And after Enter, the focus will automatically revert to the main 
field, as for keyboard operation. 

4.10 CHANGING FOCUS WHILE PLAYING 

If you change focus while in the playing state, the system should continue reading out, 
uninterrupted by changes in values. For example you should be able to change the volume 
without interruption of the reading of the main text. 
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Note that all fields except the main field and the time are static. The time ticks over regardless 
of what is happening in other fields - so it is as if it is a "track" playing in parallel with other 
fields. 

4.11 EDITING MODE 

The editing mode will be entered when you "copy" a value from a field using Copy or control- 
c, and the main field will display the value "copied", for you to edit. You exit from editing 
mode, either by pressing Escape or by pressing Return. The latter has the result of taking the 
text you have edited and treating the part after the "=" sign as a URL for a page to be fetched, 
which then becomes the principal document. 

4. 12 WINDOWS FOR FUNCTIONS 

There can be different windows, or pull-down boxes, for different functions. This is important 
for situations where there is limited screen space, since such a window or box can overlay the 
main window. The operations and presentation in the subsidiary windows or boxes may obey 
the same paradigms as in the main window, and their content may be treated as "meta- 
documents" associated with the "principle document" being displayed in the main window. 
The simplest window is a reading window for the child. There can be other windows or boxes 
for: 

• bookmarks and URL editing; 

• forms; 

• configuration of speech and display parameters; 



• search; 

• conventional scrolled display of original/source text; 

• conventional edit; 

• frames; 

• dictionary. 

It is possible to change from the main (read-only) window to another window, either directly, 
using the numeric pad, or indirectly, with an existing "short-cut" command, using letters. 

4.12.1 READING WINDOW 

This will be as for "layout for child", but without the bookmarks which will now be in a 
separate window. 

4.12.2 BOOKMARKS WINDOW 

This will contain the bookmark list, analogous to the list of links, containing name-URL pairs 
as at present. The main field will be for displaying text read from the list. There will be an 
edit field, also with resizeable characters, but scrolling. 

Whilst in any window, a control-c on a URL or name of a URL-name pair will copy a value 
into the edit field of the bookmarks window and transfer focus to it, ready for editing and/or 
addition to the bookmark list. While in the bookmarks window, the Enter command will 
cause the URL in focus to be used to fetch a file or page, and the focus will be transferred 
back to the reading window's main field, whilst the name-URL pair is added to the return 
stack, with the name appearing in the Return field. 
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4.12.3 FORMS WINDOW 

When reading a page, and encountering a form, the focus can be transferred to the forms 
window, allowing the form to be filled and submitted. The forms window will have a main 
field for reading fixed text on the form, e.g. questions. There will be a text field for typing 
written answers, and a choice field, allowing multiple-choice selection (with radio button or 
tick box functions). 

The text field will be scrolling, with resizeable characters, as for URL editing in the 
Bookmarks window. 

4.12.4 CONFIGURATION WINDOW 

This will have the same format as the forms window, but without the editing field. It will 
allow values of speech and display parameters to be changed. 

4.12.5 TABLES WINDOW 

When you are reading through an HTML page, and come across a table, you can switch to a 
table window, which will allow you to navigate up and down columns, and across rows, 
seeing the column and row headings. 

4.12.6 SEARCH WINDOW 

This will have a format corresponding to a conventional search or find dialogue box. There 
will be an edit field for typing the search string, and there will be other fields where you can 
select an option (case sensitive, etc.). 

4.12.7 ORIGINALS (SOURCE) WINDOW 

This will allow you to look at the original "source" text, i.e. showing the HTML markings if 
any. It will present die text conventionally in a scrollable window. It will have a cursor 
corresponding to the position reached in the Reading window. 

4.12.8 DICTIONARY WINDOW 

This allows you to look at a dictionary definition of a word that appears in the main window. 
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Notes on Diagrams 

FIGURE 1 

This shows various states, e.g. 10 shows the "Play all through" state (when the system is 
reading through the document without stopping). These states are connected by lines with a 
direction, e.g. 11 and 12. On these lines: 

D = down arrow, U = up arrow, L = left arrow, R = Right arrow 

Ellipses show return to state, to read next or previous unit (sentence, word or character). 

There are levels according to stepping size. The top level is for playing the whole document, 
then there are levels for sentence, word and character. Pause from top level is achieved by D 
(see 11), then U to carry on playing (see 12), or L to play current sentence, or R to play next 
sentence. While playing sentence, you can press U to continue playing to end of document 
(back to top level), or L to play previous sentence, R to play next sentence, or D to pause on a 
word (ready to step through words). 

The system can be extended with levels for intermediate sizes, e.g. section and paragraph 
levels between document level and sentence level. There is a pair of states for each level, as 
for the sentence and word level above, with corresponding interconnections (e.g. paragraph to 
sentence has the same interconnection pattern as sentence to word). 

FIGURE 2 

Figure 2a shows the operation of the four cursor keys (Left, Right, Up and Down) in the "five 
key operation" embodiment There are two basic states, 13 and 14, at the higher levels, and 
one basic state, 15, at the lower (word and character) levels. r 

With three key operation, two keys are Left and Right, operating as shown, and the third key 
takes you from one of levels into a list of options, see 16, as shown in Figure 2b. And then the 
third key operated again takes the action selected from the options using Left and Right 
operations. 

FIGURE 3 

This shows a window on the screen for a particular embodiment. 

17 shows flie title field, 18 shows the heading field, 19 shows the progress indicators, 20 
shows the main field, and 2 1 shows the cursor buttons. 

FIGURE 4 

This shows an embodiment as a wrist worn unit 22 shows the display. 23 shows one of a 
number of buttons on each side of the display for the fingers and thumb to operate. 



BNSDOCID: <GB 236921 8A_J_> 



FIGURE 5 

Figure 5 shows a tree of values, documents, etc. This would be a typical structure. 
Menu 

Time 

Current time 
Year 
Month 

Day (of month) 
Day (of week) 
Time (of day) 

Exact time (minutes and seconds) 

Alarms 

1. Alarm 

Time (of day) 
Period (daily, weekly) 

2. Alarm 



Chime 

Hourly 

Quarter-hourly 
Off 

Setting the time (with some values and effect of operations below) 

Year 

2001 (Rights take you to 2002, etc., and Lefts to 2000, etc.) 

Month 

July (Rights take you to August, etc., and Lefts to June, etc.) 
Day (of month) 

3 1 (Right gives you 1 , Left gives you 30) 
Day (of week) 

Tuesday (Right gives you Wednesday, Left gives you Monday) 

Hour 

1 6 (Right gives you 1 7, Left gives you 1 5) 

Minutes 

05 

Seconds 

24 

Stepping 

Word 

Sentence 

Paragraph 

Section 

Column 

Row 

Off (not stepping - as for reading straight through a document) 

Speak (things displayed, in case cant be seen) 

Current word 

Current heading 
N Current title 

Current URL 

Progress (% through reading a document) 
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Navigation 

Start (go to start of document) 

End (go to end of document) 

Link 

Back 

Forward 

Documents 

Drives 

A; 
B; 
C: 



Display 

Normal text 

Speed 



.... (root directory with tree below) 
(subdirectory) 



10 (words per minute) 
20 



Size 

10 (in points) 
20 

Text colour 
Red 

(degree) 

Blue 

(degree) 

Yellow 

(degree) 

Background 
Red 

(degree) 

Blue 

(degree) 

Yellow 

(degree) 

Swap colours 
Unvisited link 

Visited link 

Headings 
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Speech 



Normal text 

Speed 
Gap 



(speed that each word is spoken) 
(proportion of gaps to words) 



Volume 

Voice 

Pitch 

Unvisited links 

Visited links 

Headings 

System messages 



Controls (for environment and home appliances) 



Off 

Volume 
Channels 

Front door 

Lock 

Unlock 

Open 

Central heating 



Memoranda 

Bookmarks 
Shopping list 
Engagements 

Phone 

Phone book 

Call (find name and call the number) 
1. ... (First entry) 
2. ... (Second entry) 

Add entry 
Messages 

Past calls (call register) 
Settings 



Off 



TV 
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Serial Presentation System - Claims 



1 A serial presentation system comprising two or more serial computer output means, 
including a display device such as a computer monitor and a sound output device such a 
loudspeaker; and one or more computer input means by which the presentation of the serial 
output is controlled and by which the content of that output is selected; where the content can 
be a document, and the presentation can be controlled by the input with respect to style, speed, 
and position within the document; and where the serial computer output means includes a 
word-by-word serial visual presentation on the display device which can be synchronised to 
the output on the sound output device of die said words as serial speech presentation, from a 
speech synthesiser or using pre-recorded speech, sudfi that each word is displayed as it is 
spoken. 

2 A system as claimed in claim 1, including a state machine by which the presentation is 
controlled; where the state machine has states corresponding to the playing of the serial output 
and the pausing of the serial output, and these states may exist at several levels, a level 
typically corresponding to a syntactic unit of a document being read; and where transitions 
between states are caused by operations through the input means, thereby controlling the 
presentation of the serial output, and allowing the user of the system to step backwards and 
forwards by a unit , such as a sentence, word or character of the document being presented, see 
figures 1 and 2 for example. 

3 A system as claimed in claim 2, in which the state machine can be altered for different users, 
to provide them with a different set of state definitions and transitions, according to each 
user's preference or capabilities. 

4 A system as claimed in claimed in any preceding claim, in which the input means for 
control of presentation comprises a small set of keys or buttons as typically found on a keypad 
or on a mobile phone. 

5 A system as claimed in claim 4, in which the buttons are arranged so that they can be 
operated with fingers and thumb while the user is viewing the display, see figure 4. 

6 A system as claimed in any preceding claim, in which the input means for control of 
presentation comprises a small set of on-screen buttons, each being an area on the display 
which the user can select, typically by clicking with a mouse. 

7 A system as claimed in any preceding claim, in which die input means for control of 
presentation includes a speech recognition engine capable of recognising any word from a set 
of spoken commands from the user of the system, where some or all of these commands cause 
operations to be performed. 

8 A system as claimed in any of claims 2 to 7, in which the input means allows four basic 
operations, called herein Up, Down, Left and Right corresponding to the four cursor keys on a 
keyboard or keypad, but could be implemented by four commands with speech recognition, or 
by four buttons on a watch, or by the four non-numeric buttons typically found on a mobile 
phone; see figures 1 and 2. 

9 A system as claimed in claim 8, in which the four basic operations are used to navigate a tree 
structure, using Up and Down to go up and down the tree, and Left and Right to select a 
branch or a 'leaf of die tree, where a leaf is typically a value of a presentation parameter, such 
as speed, see figure 5. 
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10 A system as claimed in claim 8, but in which there are only three basic operations: Left 
v and Right with the third taking the action selected from the options using the Left and Right, 

see bottom of figure 2. 

1 1 A system as claimed in any preceding claim, in which there is an extra operation allowing 
a 'principal' document to be continue to be serially displayed while the other four operations 
are used in navigating the tree, *meta-document' or nested document, allowing control of the 
presentation characteristics of the principal document, such as speed, while it is playing; 
similar parallel presentation being possible for a principal document and other documents, or 
'streams' of serial presentation, which might include: 

• information about this document extracted and inserted into lists, e.g. a list of hypertext 
links within the document, or a list of headings within the document; 

• style sheets; 

• embedded objects, such as tables and hypertext links, which have a structure in their own 
right; 

• communication by Internet Relay Chat, where the log of the conversation or "conference" 
is treated as the principle document, which is appended as text arrives from parties to the 
discussion; and the input by the user is treated as an associated document, which is cleared 
as the input is sent; 

• streams defined by a mark-up of the material to be presented according to a standard such 
as SMIL; 

where the system provides the means for changing of focus, for user navigation and control, 
between the principle document and the other documents or streams. 

12 A system as claimed in any preceding claim, in which the selection of document for 
presentation involves navigation of documents associated through a directory structure, for 
example as part of a tree - see figure 5 under "documents", and/or through hypertext linkage. 

13 A system as claimed in any preceding claim, in which the display is physically small 
compared to a typical computer display, and would be typically a liquid crystal display 
(LCD), see figure 4 for example; and in which the size of the characters of the words being 
serially displayed can be adjusted such that longer words occupy much of the width of the 
display area of the display device. 

14 A system as claimed in any preceding claim, in which there is a main field for reading the 
document, and subsidiary fields in which structural elements of a document, such as headings, 
and presentation values, such as speed, can be shown, see figure 3. 

15 A system as claimed in my preceding claim, in which there other modalities: a tactile 
output modality which may be dynamic refireshible Braille, or it may be a code for serial 
tactile output of characters or phonemes across fingers and thumb; is a symbol modality, when 
words are associated with symbols, according to the PCS or Rebus symbol sets for example; 
and or a sign language modality, when words are associated with signs, according to British 
Sign Language (BSL) for example. 

16 A system as claimed in any preceding claim, allowing output of a word in different 
modalities to be synchronised, such that, as a word is displayed or spoken, the corresponding 
symbol or sign (if there is one) is displayed in a separate window, there being a database of 
symbols or signs associated with words in a wordlist and in the case where there are several 
symbols for a single word, such as "present", the choices can be displayed together. 
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17 A system as claimed in any preceding claim* with capabilities for the use of modalities in 
combination: 

• output modalities are in parallel and synchronised, e.g. speech with word display; 

• using a modality as a check or prompt, where the user can check a word by clicking on a 
certain button or pressing a certain key, to obtain that word expressed in a different 
modality; 

• using one modality to follow another, e.g. so the user sees the word before hearing it; 

• with the user inputting in one modality, before, during or after the system has output a 
word in the same or another modality, e.g. so the user sees the word and tries speaking it; 

• the system can record the user's input, either as separate words or as phrases or sentences, 
and then synchronise with the corresponding output of the system in a different modality; 

• a user can type words as they are spoken, and then play back with the speech and text 
displayed in synchrony; 

• the user can speak words as they are displayed, and then play back the text with their own 
speech synchronised to it; 

• the user can read silently, then read with the synthesised or instructor's voice, then read a 
word and record their own speech, then hear their speech back, synchronised with the text, 
the system recording each word spoken along with the text of the word, so it can be played 
back at a later moment; 

• there can be a record of the instructor's spoken word as well as the student, so they can be 
played together, or alternating. 

18 A system as claimed in any preceding claim, allowing input of text using four keys and the 
tree structure for example using the following procedure: 

• with the first letter of a word, you go Down; 

• then you are presented with a circular list of groups, such as ABCD, EFGH, UKLMN, 
OPQRST, UVWXYZ; these are presented as a branch of the tree, which you can traverse 
using Left and Right; 

• you select one group with a Down; 

• and then you are presented with the individual letters of that group to choose between, and 
you select one with a second Down; 

• you are then presented with the circular list of groups again, 

• and you can go oh to select the second letter, third letter, and so on - thus there is a two- 
stage selection for each character, 

• when you finish the word you do an Up, and you can hear the word spoken to give you 
feedback; 

• then you can do a Down to start the next word, or do a Left or Right for other possibilities, 
such as selection of punctuation mark, or change to (or from) capitalisation; 

with a similar procedure for input of numbers being possible, but only one stage is needed, 
where the numbers are presented as a circular list: on Down, you are presented with '0% and 
then a single Right will give you * 1 two Rights give you *2\ etc. 

19 A system as claimed in any preceding claim, in which the speech synthesiser is mounted 
on the web server, but the presentation is still controlled from the user's client: 

• speech is sent from the server as an encoded file, such as a tt .wav" file, to the client; 

• this file, or a parallel file, has extra coding to mark word boundaries, allowing 
synchronisation of serial visual display with the speech; 

• typically the server sends a sentence in advance of the sentence that the user is reading; 



• the files are stored in the client, for reuse, in case the user wants to step backwards a 
sentence or more; 

• the user can step forwards and backwards a word at a time, and the relevant word is 
extracted from the file, using the word boundary markings; 

or in which the speech synthesiser is split into a front end and a back end, with the text 
converted to a phonetic notation by the front end, and the notation passed from the server to 
the client machine and then converted to speech sounds on the user's client; but where the 
former conversion may be done at an earlier stage, so that a web page contains both a text 
version and a corresponding static phonetic version; and the latter conversion may be 
performed by an applet previously downloaded from the web site, or by a program allowing 
the conversion of a stream of standard phonetic notation from any web site: 

• such a notation might be that used in a pronunciation dictionary; 

• it would typically contain allophones and syllable stress markings, but also word stress and 
intonation, in order to give a complete inflection for each word, and thus for the sentence 
as a whole; 

• the notation can allow a reverse translation, back to words of text, so that the software (e.g. 
applet) on the client can display each word as its synthesised sound is output - i.e. the word 
is displayed at the same time as it is spoken; 

• the notation can also disambiguate words that have several meanings, so for example it can 
use mark-up similar to that proposed for the semantic web, this disambiguation being 
useful if there is a choice of symbol according to the meaning. 

or the same approach can be used with pre-recorded speech, such as produced by an actor for 
a talking book: 

• this speech is divided typically into sentences, 

• each sentences is marked up with word boundaries so that the display can subsequently be 
synchronised with the speech; 

• the mark up may be virtual, e.g. by a time indication of when each word ends with respect 
to the beginning of the sentence. 

20 A system as claimed in any preceding claim, including means to synchronise audio output 
of words of recorded speech with visual output of words in a serial display of the text which 
was spoken; this allowing the word-by-word display and control to be used in multimedia 
presentation, where one of the streams is recorded speech, e.g. for a talking book; such that the 
user can navigate the text, and control the style of presentation, while maintaining the 
synchronisation at the word level; and where the means of synchronisation can include: a 
speech recognition engine which converts speech to text; a correlator which compares and 
aligns this text with the text being spoken; a marking process whereby the digitised speech 
recording is marked with word-beginnings and/or word-ending codes; and search engine that 
looks for these marks when speech output is to accompany visual output of the words. 
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